home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Language/OS - Multiplatform Resource Library
/
LANGUAGE OS.iso
/
cpp_libs
/
varia
/
grammar1.lha
/
c++grammar1.1
/
grammar4.txt
< prev
next >
Wrap
Text File
|
1990-06-07
|
69KB
|
1,635 lines
Copyright (C) 1989.1990, James A. Roskind, All rights reserved.
Abstracting with credit is permitted. The contents of this file may
be reproduced electronically, or in printed form, IN ITS ENTIRETY
with no changes, providing that this copyright notice is intact and
applicable in the copy. No other copying is permitted without the
expressed written consent of the author.
FILENAME: GRAMMAR4.TXT
AUTHOR: Jim Roskind
Independent Consultant
516 Latania Palm Drive
Indialantic FL 32903
(407)729-4348
jar@ileaf.com
or ...!uunet!leafusa!jar
A YACC-able C++ 2.0 GRAMMAR, AND THE RESULTING AMBIGUITIES
(Release 1.1 Updated 6/90)
ABSTRACT
This paper describes ambiguous aspects of the C++ language that have
been exposed by the construction of a YACC-able grammar for C++. The
grammar is provided in a separate file, but there are extensive
references to the actual grammar. In light of the fact that not all
readers will have access to a YACC capable of processing my grammar,
I have also included (in an appendix) a significant excerpt from the
verbose output of running YACC on my grammar.
This release of the grammars is provided to add some final touches
identified by the many users that picked up my original posting in
3/90. As with my first posting, nested types have been omitted from
this release of the grammar. The decision to incorporate nested
types into the C++ language was only clarified with AT&T's release of
version 2.1 of C++. Since the addition of nested types appears to
notably complicate the interaction required between the parser and
the lexer, it seemed appropriate to release an update to the grammar
that tries to correct all known deficiencies found in my first
release. A future release should incorporate all enhancements needed
for nested types.
Theoretical Note (skip if you hate theory): My C++ grammar does not
make use of the %prec or %assoc features of YACC, and hence conflicts
are never hidden within my grammar. The wondrous result is that, to
the extent to which my grammar can be seen to span the syntax of the
language, in all places where YACC reports no conflicts, the grammar
is a totally unambiguous statement of the syntax of the language.
This result is remarkable (I think), because the "general case" of
deciding whether a grammar is ambiguous or not, is equivalent to (the
unsolvable) halting problem.
Although this paper is terse, and perhaps poorly (or at least
hastily) formed, I believe that its content is so significant to my
results that it had to be included. I am sorry that I do not have
the time at this point to better arrange its contents.
CONTENTS
INTRODUCTION
REVIEW: STANDARD LEXICAL ANALYSIS HACK: TYPEDEFname VS IDENTIFIER
STATUS OF MY "DISAMBIGUATED" GRAMMAR
18 EASY CONFLICTS, WITH HAPPY ENDINGS
2 REDUCTION CONFLICTS WITH STRAIGHT FORWARD DISAMBIGUATION
3 NOVEL CONFLICT THAT YIELD TO SEMANTIC DISAMBIGUATION
THE TOUGH AMBIGUITIES: FUNCTION LIKE CASTS AND COMPANY
SAMPLE RESOLUTIONS OF AMBIGUITIES BY MY C++ GRAMMAR
DIFFICULT AMBIGUITIES FOR C++ 2.0 PARSER TO TRY
COMMENTARY ON CURRENT C++ DISAMBIGUATING RULES
SOURCE OF CONFLICTS IN C++ (MIXING TYPES AND EXPRESSIONS)
FUNCTION LIKE CAST vs DECLARATION : THE TOUGH EXAMPLE
CONCLUSION
APPENDIX A:
PROPOSED GRAMMAR MODIFICATIONS (fixing '*', '&' and ':'
conflicts)
APPENDIX B:
AMBIGUITIES IN MY C++ GRAMMAR AS LISTED BY YACC VERBOSE OPTION
APPENDIX C:
ADVANCED TUTORIAL ON YACC CONFLICTS
INTRODUCTION
This paper is intended to go into significant detail about the
ambiguities that I have seen in the C++ 2.0 language, and exposed by
attempting to develop a YACC compatible (i.e., LALR(1)) grammar. I
must point out that this is NOT meant as an attack on the language or
the originators thereof, but rather an R&D effort to disambiguate
some details. (I personally believe that Stroustrup et. al. are
doing a GREAT job). I have the vague hope that the extensive hacking
that I have done with YACC on the C++ grammar has given me a rather
novel vantage point. (I have expressed my observations to Bjarne
Stroustrup, and in doing so verified that other folks had not
previously identified several of my points). In light of my
activities, this paper includes a fair amount of philosophizing. I
hope that none of this paper assumes too greatly that I have insight
that is beyond that of the readers, and certainly no insults are
intended.
If you like investigating grammars directly (rather than reading what
I have to say), I would strongly suggest you read the ANSI C grammar
before looking at the C++ grammar. The C++ grammar is pretty large,
and you can expect the following stats if you have a similar YACC to
that I am using:
Berkeley YACC (1.4 2/25/90):
25 shift/reduce conflicts, 7 reduce/reduce conflicts.
103 terminals, 161 nonterminals
609 grammar rules, 1157 states
AT&T YACC (with large table sizes):
123/127 terminals, 160/200 nonterminals
609/650 grammar rules, 1157/1200 states
25 shift/reduce, 7 reduce/reduce conflicts reported
243/350 working sets used
memory: states,etc. 13727/60000, parser 10871/12000
495/600 distinct lookahead sets
756 extra closures
7300 shift entries, 25 exceptions
1797 goto entries
4731 entries saved by goto default
Optimizer space used: input 18036/60000, output 8092/12000
8092 table entries, 3568 zero
maximum spread: 351, maximum offset: 1148
REVIEW: STANDARD LEXICAL ANALYSIS HACK: TYPEDEFname VS IDENTIFIER
This section is targeted at readers that are not familiar with
parsing C with a context free grammar. The reader should note that
there are two distinct forms of `identifiers' gathered during lexical
analysis, and identified as terminal tokens in both of my grammars.
The terminal tokens are called IDENTIFIER and TYPEDEFname
respectively. This distinction is a required by a fundamental element
of the C language. The definition of a "TYPEDEFname" is a lexeme
that looks like a standard identifier, but is currently defined in
the symbol table as being declared a typedef (or class, struct, enum
tag) name. All other lexemes that appear as identifiers (and are not
keywords) are tokenized as IDENTIFIERs. The remainder of this
section will review the motivation for this approach.
Consider the following sample code, which uses the C language:
...
int main() { f (*a) [4] ; ...
Most reader will intuitively parse the above sequence as a function
call to "f", with an argument "*a", the return value of which is
(probably) a pointer, and hence can be indexed by "4". Such an
interpretation presumes that the prior context was something like:
...
extern float *a;
char * f(float);
int main() { f (*a) [4] ; ...
However, if the prior context was **INSTEAD**:
...
typedef int f;
extern float a;
int main() { f (*a) [4] ; ...
then the interpretation of the code is entirely different. The
fragment in question actually redeclares a local variable "a", such
that the type of "a" is "pointer to array of 4 int's"! So we see in
this example that the statement "f(*a)[4];" cannot be parsed
independent of context.
The standard solution to the above ambiguity is to allow the lexical
analyser to form different tokens based on contextual information.
The contextual information that is used is the answer to the
question: "Is a given identifier defined as a typedef name (or class
name) at the current point in the parse"? I will refer to this
feedback loop (from the parser that stores information in a symbol
table, wherein the lexer extracts the information) as the "lex hack".
With this lex hack in place the code fragment "f(*a)[4];" would be
provided by the lexer as either:
IDENTIFIER '(' '*' IDENTIFIER ')' '[' INTEGERconstant ']' ';'
or
TYPEDEFname '(' '*' IDENTIFIER ')' '[' INTEGERconstant ']' ';'
The two case are very easy for a context free grammar to distinguish,
and the ambiguity vanishes. Note that the fact that such a hack is
used (out of necessity) demonstrates that C is not a context free
language, but the hack allows us to continue to use an LR(1) parser,
and a context free grammar.
Note that this hack is, of necessity, also made use of in the C++
grammar, but no additional feedback (a.k.a.: hack) is required. The
interested reader should also note that this feedback loop (re:
updating the symbol table) must be swift, as the lexical analyser
cannot proceed very far without this contextual information. This
constraint on the feedback time often prevents the parser from
"deferring" actions, and hence increases the pressure on the parser
to rapidly disambiguate token sequences.
(Although this version of the C++ grammar does not support nested
types, the advanced reader is encouraged to consider their impact on
the above problem. Specifically, code fragments such as "T::f" must
syntactically distinguish themselves as either typenames, or members.
Problems in this area can be expected to stress the above hack almost
to its breaking point. The resolution to this problem will be
discussed in the next release of the grammar,... so stay tuned.).
STATUS OF MY "DISAMBIGUATED" GRAMMAR
My grammar is now fairly complete. Several independent reviews,
which provided a front end lexical analyser and parsed existing C++
code, have verified that the grammars span of the C++ language
appears complete. I have not incorporated parametric types nor
exception handling into the grammars at this point, as they continue
to be in a state of flux.
The grammar does (I believe) support all the features provided in C++
2.0, including multiple inheritance and the enhanced operator "new"
syntax (includes placement expression). I believe that, except for
the minor change involving not permitting parentheses around bit
field names during a declaration, my C++ grammar supports a superset
of my ANSI C grammar. Note that I haven't inline expanded all the
rules in the C grammar that were required for C++ disambiguation (re:
deferring reductions), and hence a `diff' of the two grammars will
not provide a trivial comparison. The resulting major advantage of
this grammar over every other current C++ parser (that I know of) is
that it supports old style function definitions AS WELL AS all the
standard C++. (It is my personal belief that such support was
dropped by many compilers and translators in order to resolve the
many syntax problems that appear within C++. I believe this grammar
shows that such a decision was premature).
My list of shift-reduce and reduce-reduce conflicts is currently: 25
shift/reduce, 7 reduce/reduce conflicts reported. I have chosen to
leave so many conflicts in the grammar because I hope to see changes
to the syntax that will remove them, rather than making changes to my
grammar that will firmly accept and disambiguate them. (Considering
the detailed analysis presented here, such changes would only add
unnecessary complications to the grammar).
8 SR caused by operator function name with trailing * or &
8 SR caused by freestore with trailing * or &
1 SR caused by operator function name OR freestore, with trailing :
1 SR caused by dangling else and my laziness
1 SR and 1 RR caused by operator function name and trailing {
3 SR caused by constructor declaration vs member declaration
3 RR caused by function-like cast vs identifier declaration ambiguity
3 RR caused by function-like cast vs typedef redeclaration ambiguity
3 SR caused by redundant parened TYPEDEFname redeclaration vs old style cast
Of these conflicts, the ones that most C++ parser authors are mainly
concerned with are the last 9 conflicts, relating to
function-like-cast vs declaration, and redundant parened TYPEDEFname
redeclaration vs old style cast. The following sections breeze
through the "easy" conflicts, and then talk at length about the 9
tough ones.
18 EASY CONFLICTS, WITH HAPPY ENDINGS
The first group of 18 SR conflicts:
8 SR caused by operator function name with trailing * or &
8 SR caused by freestore with trailing * or &
1 SR caused by operator function name OR freestore, with trailing :
1 SR caused by dangling else and my laziness
have very simple resolutions. If you are reading this, I assume that
you are already familiar with the if-if-else conflict.
The trailing ':' has to do with the use of the colon in a ternary
"... ? ... : ..." expression. The easiest example is:
void * p = boolean ? new struct S : ...
The problem is that this MIGHT be the first mention of "struct S",
and this MIGHT be where the elaboration of the structure is provided!
Under such circumstances, what follows the ':' MIGHT be a base class
name, and the entire curly braced elaboration for S (ugh!).
My resolution of this SR conflict is that "the longest possible type
is constructed by the parser". Hence the ':' is construed to be part
of the type "struct S". This is in keeping with the subtle statement
on in section 7.1 of the C++ 2.0 Ref Man: "The longest sequence of
decl-specifier that could possibly be a type name is taken as the
decl-specifiers of a declaration".
The 8 conflicts based "freestore with trailing * or &" can be hinted
at by the example:
a = new int * * object;
Is the above the same as:
a = (new int) * (* T);
or:
a = (new (int *)) * T;
Again the "longest possible type" is isolated by my grammar. The
result is:
a = (new (int * * )) ...
which leads to a syntax error in my example! This resolution is
indeed what is quietly specified for C++ 2.0 (and implemented in
cfront). The critical statement and example in the C++ 2.0 Ref Man
at the end of section 5.3.3 makes this resolution clear.
The 8 conflicts involving "operator function names with trailing * or
&" are quite similar to what was just presented. The critical fact
is that "operator typename" is allowed in the grammar to define a
function. Whenever a function is provided, but NOT followed by a
'(', the address of the function is implicitly taken and used in the
expression (see draft ANSI C standard for finer details). For some
class T, the following MIGHT all be valid statements:
operator T;
operator T*;
operator T**;
If the above are valid, then the interpretation of the following is
ambiguous:
operator T * * a;
The above might be interpreted as:
(operator T) * (* a);
or
(operator (T *)) * a;
The default LR rules parse the largest possible type, and lead to:
(operator (T * * )) ...
which in our example leads to a syntax error. Here again the
"longest possible type..." rule supports my grammar. Note that this
rule is actually a consequence (in my opinion) of the cfront
implementation via a YACC grammar, and the default resolution of
conflicts in that grammar.
2 REDUCTION CONFLICTS WITH STRAIGHT FORWARD DISAMBIGUATION
The two conflicts that were described as:
1 SR and 1 RR caused by operator function name and trailing {
occur when there is an ambiguity as to whether the '{' is the start
of a function body in a function definition, or the start of a
structure/class/enum elaboration. In part, this ambiguity is caused
by the fact that an arbitrary declarator is used in a function
definition, but semantics require the declarator in a definition to
have type "function returning ...". This subtlety causes the
following to be a syntactically valid function definition, even
though it is semantically invalid:
void func { int a; } ...
Semantic requirements on the declarator demand something more like:
void func() { int a; } ...
By using "operator struct S" in place of "func" in the first example
we get:
void operator struct S { int a; } ...
My C++ grammar attempts to assemble the longest possible type, and
hence parses this example as equivalent to:
void operator (struct S { int a;} ) ...
Interestingly enough, this resolution not only supports the "longest
possible type" rule, but also avoids the semantically invalid parse.
3 NOVEL CONFLICTS THAT YIELD TO SEMANTIC DISAMBIGUATION
The conflicts that are discussed in this section have been deferred
(by A LOT of work, and A LOT of inline expansion) to occur when a ';'
is reached. At that point, semantic information in the tokens can
safely be used to decide which of two cases are at hand.
The conflicts referred to as:
3 SR caused by constructor declaration vs member declaration
occur during a class/struct elaboration. Consider the following
class elaborations:
typedef int T1, T2, T3 ;
class GOO { int a;} ;
class FOO {
GOO T1 ; // clearly a redefinition of T1
FOO ( T2 ); // clearly a constructor
GOO ( T3 ); // redefinition of T3
};
Note that the last two entries in FOO's elaboration "FOO(T2);" and
"GOO(T3);" are tokenized IDENTICALLY, but must have dramatically
different meanings. When I first found this ambiguity I was hopeful
that I could extend the lex hack that distinguishes TYPEDEFnames from
random IDENTIFIERs, and distinguish something like CURRENTCLASSname.
Unfortunately, the potential for elaborations within elaborations
appears to make such a hack unworkable. In addition, once I got my
grammar to defer all such ambiguous cases until a ';' was seen, I
felt confident that the ambiguity was resolved (and the introduction
of an additional "hack" was unnecessary).
THE TOUGH AMBIGUITIES: FUNCTION LIKE CASTS AND COMPANY
The ambiguities listed in this section pertain to attempts to
distinguish declaration/types-names from expressions names. For
example:
char *b ="external" ; // declare a variable to confuse us :-)
main () {
class S;
S (*b)[5]; // redeclare "b" pointer to array of 5 S's ?
// OR ELSE indirect through b; cast to S; index using 5 ?
}
The above is what I call the "declaration vs function like cast
ambiguity". Awareness about this ambiguity in this context appears
fairly widespread among C++ parser authors. The C++ 2.0 Ref Man
makes explicit reference to this problem in section 6.8 "Ambiguity
Resolution". I believe the underlying philosophy provided by the
Reference Manual is that if a token stream can be interpreted by an
ANSI C compiler to be a declaration, then a C++ compiler should
disambiguate in favor of a declaration. Unfortunately, section 6.8
goes on to say:
"To disambiguate, the whole statement may have to be examined to
determine if it is an expression-statement, or a declaration.
... The disambiguation is purely syntactic; that is, the meaning
of the names, beyond whether they are type-names or not, is not
used in the disambiguation".
The above advice only forestalls the inevitable ambiguity, and
complicates the language in the process. The examples that follow
will demonstrate the difficulties.
There are several other contexts where such ambiguities (typedef vs
expression) arise:
1) Where a statement is valid (as shown above).
2) As the argument to sizeof()
3) Following "new", with the C++ syntax allowing a placement
expression
4) Immediately following a left paren in an expression (it
might be an old style cast, and hence a type name)
5) Following a left paren, arguments to constructors can be
confused with prototype type-names.
6) Recursively in any of the above, following a left paren
(what follows might be argument expressions, or might
be function prototype parameter typing)
Examples of simple versions of the sizeof context are:
class T;
sizeof ( T ); // sizeof (type-name)
sizeof ( T[5] ); // again a type name
sizeof ( T(5) ); // sizeof (expression)
sizeof ( T() ); // semantic error: sizeof "function returning T"?
// OR ELSE sizeof result of function like cast
Examples of the old style cast ambiguity context, which are still
ambiguous when the '(' after the 'T' has been seen:
class T { /* put required declarations here */
};
a = (T( 5)); // function like cast of 5
b = (T( )) 0; // semantic error: cast of 0 to type "function
// returning T"
In constructors the following demonstrates the problems:
class T;
T (b)(int d ); // same as "T b(int);", a function declaration
T (d)(int (5)); // same as "T d(5);", an identifier declaration
T (d)(int ( )); // ambiguous
The problem can appear recursively in the following examples. By
"recursively" I mean that an ambiguity in the left-context has left
the parser unsure of whether an "expression" or a "type" is being
parsed, and the ambiguity is continued by the token sequence. After
the parser can determine what this subsequence is, it will in turn be
able to disambiguating what the prior tokens were.
Recursion on the statement/declaration context:
class S;
class T;
S (*b)(T); // declare b "pointer to function taking T returning S"
S (*c)(T dummy); // same declaration as for "b"
int dummy;
S (*d)(T (dummy)); // This T might be casting dummy
Recursion on the sizeof context is shown in the following examples.
As before, the examples include semantic errors.
class T;
class S;
sizeof ( T(S dummy) ); // sizeof "function taking S returning T"
int dummy;
sizeof ( T(S (dummy)) ); // sizeof "function taking S returning T"
// OR ELSE cast dummy to S, and then cast that to T, which
// is the same as "sizeof T;"
The following are the precise conflicts, along with typical contexts.
I have derived the contexts by manually walking backwards through my
verbose YACC output. Note that I have deleted some of the
insignificant rules from the verbose state descriptions in this
section (insignificant in that they are not involved in the
conflict). To see the complete details of each conflict state see
the appendix at the end of this paper.
WARNING: THE REMAINDER OF THIS SECTION IS VERY DETAILED AND TERSE
(PERHAPS EVEN CRYPTIC); DO NOT TRY TO READ IT CAREFULLY UNLESS YOU
HAVE A LOT OF TIME TO KILL, AND A LOT OF INTEREST IN THE GRAMMAR.
(You can skip to the next section, which is a bit less technical and
terse).
---------------------------------------------------------------------
Minimal left context: "main() { int ( identifier"
Is the identifier being declared?
Is the identifier a function name?
Is the identifier an array name?
(The last two cases use function like casting into
type "int")
Left context can include an arbitrary number of '*', '&', or
'(' immediately to the left of the identifier. The
basic.type.name "int" can also be any
simple.type.name (e.g., a TYPEDEFname)
My Default is to become a declarator, which then forms a
declaration
642: reduce/reduce conflict (red'ns 17 and 22 ) on (
642: reduce/reduce conflict (red'ns 17 and 22 ) on )
642: reduce/reduce conflict (red'ns 17 and 22 ) on [
state 642
paren.identifier.declarator : rescoped.identifier_ (17)
primary.expression : rescoped.identifier_ (22)
---------------------------------------------------------------------
Required left context: "... ( TYPEDEFname ()",
where "..." includes a type.specifier.
It is assumed that "function returning TYPEDEFname"
is a valid "type.name". Semantically this is
rarely legal, but the focus is on syntax here.
The above sequence could eventually parse into:
... ( declarator )
... ( type.name ) cast.expression
... ( expression )
... ( parameter.decl.list )
Note that parameter.decl.list is:
type.name
type.name = default.value , ...
type.name , parameter.decl.list
Expanded examples are:
sizeof ( expression )
sizeof ( type.name )
where the argument to "sizeof" is one of the following:
int ( typename2 ( TYPEDEFname() )
int ( typename2 ( TYPEDEFname() ,
int ( typename2 ( TYPEDEFname() = expression
int ( TYPEDEFname()
Is the TYPEDEFname a declaration specifier?
Is TYPEDEFname used as function like cast of 0 args?
Left context can include an arbitrary number of '*', '&', or
'(' immediately to the left of the "(TYPEDEFname".
Default is to become a type.qualifier.list,
which becomes the trailing section of a parameter.type.list,
which becomes a function.returning.modifier for a declarator,
which forces a redeclaration of TYPEDEFname,
(semantics may outlaw this redeclaration of TYPEDEFname
as a function WITHOUT use of "extern"!!!)
740: reduce/reduce conflict (red'ns 74 and 64 ) on )
740: reduce/reduce conflict (red'ns 74 and 64 ) on ,
740: reduce/reduce conflict (red'ns 74 and 64 ) on =
state 740
postfix.expression : TYPEDEFname ( )_ (74)
parameter.type.list : ( )_type.qualifier.list.opt
type.qualifier.list.opt : _ (64)
---------------------------------------------------------------------
Minimal left context: "main() { int ( ( TYPEDEFname"
Is the TYPEDEFname being redeclared?
Is TYPEDEFname in parenthesis the start as old style cast?
Left context can include an arbitrary number of '*', '&', or
'(' between the two '('s. The basic.type.name "int"
can also be any simple.type.name (e.g., a TYPEDEFname)
Default is to become a typedef.declarator,
which leads to the redeclaration of the TYPEDEFname
782: shift/reduce conflict (shift 595, red'n 418) on )
state 782
type.name : TYPEDEFname_ (418)
simple.paren.typedef.declarator : ( TYPEDEFname_)
---------------------------------------------------------------------
Minimal left context: "main() { int ( ( TYPEDEFname[2]"
OR
Minimal left context: "main() { int ( ( TYPEDEFname(float)"
Is the TYPEDEFname being redeclared?
Is "array of TYPEDEFname" the type for
an old style cast? The result of the cast
expression will undergo a function like cast into
an int.
Left context can include an arbitrary number of '*', '&', or
'(' between the two '('s. The basic.type.name "int"
can also be any simple.type.name (e.g., a TYPEDEFname)
Default to form a typedef.declarator,
which leads to a redeclaration of the TYPEDEFname.
(Semantics preclude cast to this type anyway, so we have
actually syntactically disallowed a semantic error)
874: shift/reduce conflict (shift 838, red'n 591) on )
state 874
paren.postfix.typedef.declarator : ( TYPEDEFname postfixing.abstract.declarator_)
abstract.declarator : postfixing.abstract.declarator_ (591)
---------------------------------------------------------------------
Minimal left context: "main() { int ( * ( TYPEDEFname"
Is the TYPEDEFname being redeclared?
Is TYPEDEFname in parenthesis the start as old style
cast? The result of the cast will undergo and
indirection, and the result of that will be
cast to int by a function like cast!
Left context can include an arbitrary number of '*', '&', or
'(' to the left of the '*'. The basic.type.name "int"
can also be any simple.type.name (e.g., a TYPEDEFname)
Default to form a typedef.declarator,
which leads to a redeclaration of the TYPEDEFname.
875: shift/reduce conflict (shift 840, red'n 418) on )
state 875
type.name : TYPEDEFname_ (418)
paren.typedef.declarator : indirect.or.reference ( TYPEDEFname_)
---------------------------------------------------------------------
SAMPLE RESOLUTIONS OF AMBIGUITIES BY MY C++ GRAMMAR
Of the "hard examples" given in the C++ reference manual (r6.8), my
grammar can only "properly" detect a "statement-expression" for the
stream:
T(a,5)>>c;
All the other examples default to a declarator after the closing
parenthesis following the identifier. (See my comments in the
conclusion section of this paper).
I actually am not sure I agree with all the examples in the C++ 2.0
Reference Manual. Specifically, the example in section 6.8:
T (*d) (double(3)); // expression statement
In the example "T" is specified to be a simple-type-name, which
includes all the basic types as well as class-names, and more.
Considering the following are valid declarations:
void *a (0);
void *b (int(0));
void (*c)(int(0));
I am unable to see the "syntactic" difference between this last token
stream and the example just cited in the reference manual. My
simplistic parser gives me the result that I at least expect. It
concludes (prematurely, but seemingly correctly) that the stream is a
declaration (with a new style initializer).
As a positive note, my grammar is able to parse the example given a
while back in comp.lang.c++, that Zortech 1.07 cannot parse:
a = (double(a)/double(b))...;
Apparently, upon seeing "(double" some parsers commit to a
parenthesized type-name for a cast expression, and cannot proceed to
parse a parenthesized expression. No mention of this problem is
listed in my conflict list, as resolution of this problem is simply a
matter of letting the LR parser wait long enough before committing.
Specifically, my grammar has not yet committed when all of:
a = (double(a)
has been seen! The next character ('/') allows the grammar to
unambiguously conclude that the sequence "double(a)..." is an
expression.
DIFFICULT AMBIGUITIES FOR A "C++ 2.0" COMPATIBLE PARSER TO TRY
Having seen the above contexts, I would be curious to see if other
C++ front ends with "smart lexers" (such as cfront) can handle the
following. These examples are not guaranteed to be evaluated
correctly by my grammar, but I expect them to demonstrate weaknesses
in many other parsers. The interpretation of these examples per C++
2.0 definitions requires massive lookahead. In addition, the
examples are generally unreadable by humans, and rarely parsed the
same way by any two implementations.
main()
{
class T
{
/* ... */
} a;
typedef T T1,T2,T3,T4,T5,T7,T8,T9,Ta,Tb,Tc,Td;
{ /* start inner scope */
T((T1) ); // declaration
T((T2) a ); // Statement expression
T((T3)( )); // declaration of T3
T((T4)(T )); // declaration of T4
T((T5)(T a )); // declaration of T5
T((T6)(T((a) ))); // declaration of T6
T((T7)(T((T) ))); // declaration of T7
T((T8)(T((T)b))); // statement expression
T(b[5]); // declaration
T(c()); // declaration
T(d()[5]); // statement expression ? (function returning array
// is semantically illegal, but syntactically proper)
T(e[5]()); // statement expression ? (No array of functions)
T(f)[5](); // statement expression ? " "
T(*Ta() )[5] [4]; //declaration
T(*Tb() [5]) [4]; //statement expression ? (function returning array)
T(*Tc()) [3 +2]; //declaration
T(*Td()) [3 ]+2; //statement expression
}
}
COMMENTARY ON C++ 2.0 DISAMBIGUATING RULES
There are two distinct thrusts in conflict disambiguation as provided
by AT&T's efforts to define a standard for C++. The first thrust is
"parse tokens into the longest possible declarator, and identify the
syntax errors that result". The second thrust is to "use massive
external technology ("smart lexer", a.k.a.: "recursive decent parser
that helps the lexer", a.k.a. LALEX) to look ahead, so that the
parser doesn't mis-parse a function-like-cast as a declaration and
induce a syntax error". The first is a commitment to LR parser
technology, and an existing grammar (which could be cleaned up). The
second is a commitment to NOT use an LR parser, and to the use of an
existing implementation.
It is my belief that LR parsers are well understood, and the addition
of a "smart lexer" destroys all structure in a parser. The result
can be anticipated to become a quagmire of code and hacks. With this
firm conviction, I have provided my grammar in the hopes that a
standard can emerge that IS well defined, and is implementable, and
is readable by humans.
SOURCE OF CONFLICTS IN C++ (MIXING TYPES AND EXPRESSIONS)
One fundamental strength in C is the similarity between declarations
and expressions. The syntax of the two is intended to be very
similar, and the result is a clean declaration and expression syntax.
(It takes some getting used to, but it is in my opinion good).
Unfortunately, there are some slight distinctions between types and
expressions, which Ritchie et. al. apparently noticed. It is for
this reason (I am guessing) that the C cast operator REQUIRES that
the type be enclosed in parenthesis. Moreover, there is also a clear
separator in a declaration between the declarator and the
initializing expression (the '=') (as some of you know, there is some
interesting history in this area.). The bottom line (as seen with
20-20 hindsight) is: "keep declarations and expressions separate".
Each violation of this basic rule has induced conflicts.
To be concrete about the differences between types and expressions,
the following two distinctions are apparent:
1) Abstract declarators are permitted. No analogy is provided in
expressions. The notable distinction is that abstract
declarators include the possibility of trailing '*' tokens.
2) The binding of elements in a declaration is very different
from any expression. Notably, the declaration-specifiers are
bound separately to each declarator in the comma separated list
of declarators (example: int a, b, c;). With (most forms of)
expressions, a comma provides a major isolation between
expressions.
C also used reserved names to GREATLY reduce the complexity of
parsing. The introduction of typedef names increased the complexity
(it made the language context sensitive), but a simple hack between
lex and YACC overcame the problem. An example is the statement:
name (*b)[4];
Note that this is ambiguous, EVEN in ANSI C, IF there is no
distinction between type-names and function names! (i.e., "b" could
be getting redeclared to be of type "pointer to array of name", OR
the function "name" could be called with argument "*b", the result of
which is indexed to the 4th element). In C, the two kinds of names
(TYPEDEFnames and function names (a.k.a.: declared identifiers))
share a name space, and at every point in a source program the (hack)
contextual distinction can be made by the tokenizer. Hacks are neat
things in that the less you use them, the more likely they are to
work when you REALLY need them (i.e., you don't have to fight with
existing hacks). Having worked on designing and implementing a C
compiler, I was pleasantly amazed at how the constructs all fell
together.
The major violations of this approach (i.e., keep declaration
separate from expressions) that come to mind with C++ are:
function-like-casts,
freestore expressions without parens around the type,
conversion function names using arbitrary type specifiers,
parenthesized initializers that drive constructors.
The last problem, parenthesized initializers, provides two areas of
conflicts. The first is really an interference issue with old style
C function definitions, which only bothers folks at file scope (GNU's
G++ compiler considered this to be too great an obstacle, and they
don't currently support old style C definitions!). The second part
of this conflict involves a more subtle separation between the
declarator, and the initializer. (K&R eventually provided an equal
sign as an unequivocal separator, but parens used in C++ are already
TOO overloaded to separate anything). The significance of this lack
of a clear separator is that it is difficult to decide that the
"declarator" is complete, and that the declared name should be added
to the scope. The last problem does interact in a nasty way with the
function-like cast vs declaration conflicts (the problem slows the
feedback loop to the symbol table, which is critical to continued
lexing). The parened initializers also provide another context where
it is difficult to distinguish between expressions (a true argument
list for the constructor) and a declaration continuation (a parameter
type list).
The second problem listed falls out of the "new-expression" with an
unparenthesized type. This form of freestore (such as "new int * *")
allows types to be placed adjacent to expressions, and the trailing
'*' ambiguity rears its head. I can easily prove that this is the
culprit in terms of specific ambiguities, in that removing these
(unnecessary?) forms significantly disambiguates the grammar. (It is
rather nice to use YACC as a tool to prove that a grammar is
unambiguous!). It is interesting to note that if only the derivation
of a freestore expression were limited to (using the non-terminal
names of the form that the C++ Reference manual uses):
new placement-opt ( type-name ) parened-arg-list-opt
then all the LR(1) reduce conflicts based on this problem would
vanish. Indeed, the culprit can clearly be shown to be:
new placement-opt restricted-type-name parened-arg-list-opt
The characters which excite these reduction conflicts are '*', '&',
and ':'. The context in which the ':' is significant occurs when the
freestore expression is the middle expression of the ternary operator
set "?:". In this ternary operator context, the use of a type name
such as "class a" leaves the LR(1) parser confused about the meaning
of a ':' that follows.
The third problem that I indicated involves the
conversion-function-name. Here again, if the syntax were restricted
to ONLY:
operator simple-type-name
then the LR(1) conflicts would vanish. It is interesting to note
that the keyword "operator" serves as the left separator, and the
restriction to "simple-type-name" results in an implicit right
separator (simple-type-names are exactly one token long). The
conflicts appear when multiple tokens are allowed for a declaration
specifier, and an optional pointer-modifier list may be added as a
postfix. The conflicts that result from this lack of separation
include all those provided by the freestore example, and an
additional set as well. The additional conflicts are not
semantically significant, but they are noticeable to a compiler
writer. The interesting new trailing character conflict is '{'. The
context for this conflict involves the definition of a conversion
function, which always includes a function body (with a leading
character of '{' ). A simple grammar does not SYNTACTICALLY require
that "function returning modifier" follow the
conversion-function-name in all declarations/definitions, although
semantics do require such. Hence an LR(1) conflict occurs when the
type-name is of the form "struct A", and a possible structure
elaboration may follow (with leading character '{' ).
Here again (as with the unambiguous version of freestore) the syntax
could be extended to:
operator.function.name :
OPERATOR any.operator
| OPERATOR basic.type.name
| OPERATOR TYPEDEFname
| OPERATOR type.qualifier
| OPERATOR '(' type.name ')'
;
instead of:
operator.function.name :
OPERATOR any.operator
| OPERATOR type.specifier.or.name operator.function.ptr.opt
| OPERATOR type.qualifier.list operator.function.ptr.opt
;
and the ambiguities would vanish (and the expressivity would not be
diminished).
FUNCTION LIKE CAST VS DECLARATION AMBIGUITIES
The real big culprit (i.e., my anti-favorite) in this whole ambiguity
set (re: keeping types and expressions separate) is the
function-like-cast. The reason why it is so significant (to an LR
parser) is that the binding of a type-name, when used in a
function-like-cast, is directly to the following parenthesized
argument list. In contrast, the binding of a type-name when used in
a declaration is to all the "stuff" that follows, up until a
declarator termination mark like a ',', ';' or '='. Life really gets
tough for LR folks when the parse tree MUST be reduced, but you just
can't tell how yet. With this problem, the hacks began to appear
(re: the "smart lexer"). Note that these new style casts are much
more than a notational convenience in C++. The necessity of the
function like cast lies in the fact that such a cast can take several
arguments, whereas the old style cast is ALWAYS a unary operator.
I was (past tense) actually working towards resolving this problem
via some standard methods that I have developed (re: inline expansion
of rules to provide deferred reduction). I was (past tense) also
using one more sneaky piece of work to defer the reductions, as I was
carefully making use of right recursion (instead of the standard left
recursion) in order to give the parser a chance to build up more
context. I can demonstrate the usefulness of right recursive
grammars in disambiguating difficult toy grammars. Unfortunately, I
realized at some point that I NEEDED to perform certain actions (such
as add identifiers to the symbol table) in order to complete the
parse!?! This was my catch 22. I could POSSIBLY parse using an LALR
grammar, if I could only defer actions until I had enough context to
disambiguate. Unfortunately, I HAD to perform certain actions (re:
modify the symbol table, which changed the action of the tokenizer)
BEFORE I could continue to examine tokens! In some terrible sense,
the typedef hack had come back to haunt me.
I backed off a bit in my grammar after reaching this wall, and now my
grammar only waits until it reaches the identifier in the would be
declarator. I really didn't want to parse the stuff after the
identifier name (sour grapes?), because I knew I would not (for
example) be able to identify a "constant expression" in an array
subscript (most of the time, if it isn't constant, then it can't be a
declaration). I don't believe that a compiler should compete in a
battle of wits with the programmer, and the parser was already
beginning to outwit me (i.e., I was having a hard time parsing stuff
mentally that is provided as examples in the 2.0 Reference Manual).
FUNCTION LIKE CAST vs DECLARATION : THE TOUGH EXAMPLE:
The following is about the nastiest example that I have been able to
construct for this ambiguity group. I am presenting it here just in
case someone is left with a thought that there is "an easy way out".
The fact that identifiers can remain ambiguous SO LONG after
encountering them can cause no end of trouble to the parser. The
following example does not succumb to static (re: no change to the
symbol table) anticipatory lexing of a statement. As such, it
demonstrates the futility of attempting to use a "smart lexer" to
support the philosophy: "If it can be interpreted as a declaration,
then so be it; otherwise it is an expression". This ambiguous
example exploits the fact that declarators MUST be added to the
symbol table as soon as they are complete (and hence they mask
external declarations).
First I will present the example without comments:
class Type1 {
Type1 operator()(int);
} ;
class wasType2 { ...};
int (*c2)(Type1 dummy);
main ()
{
const int a1 = 0, new_var (4), (*c1)(int (a1));
Type1 (a2) = 3, wasType2 (4), (*c2)(wasType2(a1));
}
Now to repeat the example with comments:
class Type1 {....
Type1 operator()(int);
} ;
class wasType2 { ....}; /* we will almost redeclare this typename!?! */
int (*c2)(Type1 dummy); /* we will NOT redeclare Type1 */
main ()
{
/* The first line is indeed simple. It is simply placed here
to hint at how the second line MIGHT analogously be parsed. */
const int a1 = 0, new_var (4), (*c1)(int (a1));
/* As a review, "a1" is declared to be a constant with value 0.
"new_var" is declared to be another constant, but with value
4. Finally, "c1" is declared to be a pointer to a const
integer, and the initial value of this pointer is "int(a1)",
which is the same as "int(0)", or simply "0" (a.k.a., the
null pointer). It is significant that "a1" entered the symbol
table quickly so that it could be used later in the
declaration. */
/* Static lexing of what follows will suggest that the following
is also a declaration. This statement is actually 3 comma
separated expressions!! The explanation that follows shows that
assuming the 2nd statement is a declaration leads to a
contradiction. */
Type1 (a2) = 3, wasType2 (4), (*c2)(wasType2(a1));
/* Assume this second statement is a declaration. Note that by
the time "c2" is parsed, "wasType2" has been redeclared to be a
variable of type "Type1". Hence "wasType2(a1)" is actually a
function call to "wasType2.operator()(a1)", and it is not a
function prototype arg list. It follows that
"(*c2)(wasType2(a1))" is an expression, and NOT a declarator!
Since this last entry is not a declarator, the entire statement
must be an expression (ugh! it is time to backtrack). After much
work on my part, I think it might even be a semantically valid
expression. Once this backtracking is complete, we see that the
first expression "Type1 (a2) = 3" is an assignment to a cast
expression. The second expression "wasType2 (4)", is a cast of a
constant. The third expression "(*c2)(wasType2(a1))", is an
indirect function call. The argument of the call is the result
of a cast. Note that "wasType2" is actually never redeclared,
but it was close! */
/* For those of you who can parse this stuff in your sleep, and
noticed the slight error in the above argument, I have the
following "fix". The error is that the
"(*c2)(wasType2(a1))"
could actually be a declaration with a parenthesized initializer.
I could have change this token subsequence to:
"(*(*c2)(wasType2(a1)))(int(a1))"
and avoid the constructor ambiguity, but it would only complicate
the discussion. Note that in this form, if "wasType2" is not a
type, the the quoted text cannot be a declaration.*/
/* Two parens are all a user would need to add to the cryptic
example to unambiguously specify that this statement is an
expression. Specifically: */
(Type1) (a2) = 3, wasType2 (4), (*c2)(wasType2(a1));
/* or ...*/
(Type1 (a2) = 3), wasType2 (4), (*c2)(wasType2(a1));
/* I would vote for a syntax error in such ambiguous stream, with
an early decision that it was a declaration. After seeing this
example, I doubt that I could quickly assert that I could produce
a non-backtracking parser that disambiguates statements according
to the C++ 2.0 rule. I am sure I can forget about a simple
lex-YACC combination doing it. */
}
Most simply put, if a "smart lexer" understands these: a) I am
impressed, b) Why use a parser when a lexer can parse so well?
The bottom line is that disambiguation of declarations via "If it can
be a declaration, then it is one", seems to require a backtracking
parser. (Or some very fancy parsing approach). I am not even sure if
the above examples are as bad as it can get!
CONCLUSION
I believe that the C++ grammar that I have made available represents
a viable machine readable standard for the syntax description of the
C++ language. In cases where the ambiguities are still exposed by
conflicts (as noted by YACC), to further defer resolution would be
detrimental to a user. I see no benefit in describing a computer
language that must support human writers, but cannot be understood by
humans. Any code that exploits such deferral is inherently
non-portable, and deserves to be diagnosed as an error (my grammar
asserts a "syntax error"). Rather than dragging the C++ language
into support for a ad-hoc parser implementations such as cfront (and
the "smart lexer"), I would heavily suggest the use of my grammar. I
do not believe that my grammar would "break" much existing code, but
in cases where it would, the code would not be portable anyway (other
than to a port of an IDENTICAL parser).
I hope to see a great deal of use of my grammars, and I believe that
standardizing on the represented syntax will unify the C++ language
greatly.
Jim Roskind
Independent Consultant
516 Latania Palm Drive
Indialantic FL 32903
(407)729-4348
...uunet!leafusa!jar
APPENDIX A:
PROPOSED GRAMMAR MODIFICATIONS (fixing '*', '&' and ':' conflicts)
Based on the other items described above, I have the following
suggestions for cleaning up the grammar definition. Unfortunately,
it provides subtle variations from the "C++ 2.0" standard.
Current Grammar:
operator.function.name :
OPERATOR any.operator
| OPERATOR type.specifier.or.name operator.function.ptr.opt
| OPERATOR type.qualifier.list operator.function.ptr.opt
;
operator.new.type:
type.qualifier.list operator.new.declarator.opt
operator.new.initializer.opt
| type.specifier.or.name operator.new.declarator.opt
operator.new.initializer.opt
;
Proposed new grammar (which requires parens around complex types):
operator.function.name :
OPERATOR any.operator
| OPERATOR basic.type.name
| OPERATOR TYPEDEFname
| OPERATOR type.qualifier
| OPERATOR '(' type.name ')'
;
operator.new.type:
basic.type.name operator.new.initializer.opt
| TYPEDEFname operator.new.initializer.opt
| type.qualifier operator.new.initializer.opt
| '(' type.name ') operator.new.initializer.opt
;
The impact of the above changes is that all complex type names (i.e.:
names that are not simply a typedef/class name, or a basic type names
like char) must be enclosed in parenthesis in both `new ...' and
`operator ...' expressions. Both of the above changes would clear up
a number of ambiguities. In some sense, the current "disambiguation"
(of trailing '*', '&', and ':') is really a statement that whatever
an LR(1) parser cannot disambiguate is a syntax error. In contrast,
the above rules define an unambiguous grammar.
APPENDIX B:
AMBIGUITIES IN MY C++ GRAMMAR AS LISTED BY YACC VERBOSE OPTION
The following are the list of conflicts that were reported in the
verbose output from an AT&T compatible YACC. I have only listed the
conflict states here, as the entire file is well in excess of 500K.
182: shift/reduce conflict (shift 183, red'n 286) on :
182: reduce/reduce conflict (red'ns 286 and 280 ) on {
state 182
aggregate.name : aggregate.key identifier.or.typedef.name_derivation.opt $$284 { member.declaration.list.opt }
aggregate.name : aggregate.key identifier.or.typedef.name_ (286)
derivation.opt : _ (280)
: shift 183
{ reduce 280
. reduce 286
derivation.opt goto 415
187: shift/reduce conflict (shift 427, red'n 369) on {
state 187
enum.name : ENUM identifier.or.typedef.name_{ enumerator.list }
enum.name : ENUM identifier.or.typedef.name_ (369)
{ shift 427
. reduce 369
189: shift/reduce conflict (shift 50, red'n 29) on *
189: shift/reduce conflict (shift 51, red'n 29) on &
state 189
operator.function.name : OPERATOR type.specifier.or.name_operator.function.ptr.opt
operator.function.ptr.opt : _ (29)
TYPEDEFname shift 431
* shift 50
& shift 51
. reduce 29
operator.function.ptr.opt goto 428
pointer.operator goto 429
indirect.or.reference goto 430
190: shift/reduce conflict (shift 50, red'n 29) on *
190: shift/reduce conflict (shift 51, red'n 29) on &
state 190
operator.function.name : OPERATOR type.qualifier.list_operator.function.ptr.opt
type.qualifier.list : type.qualifier.list_type.qualifier
basic.type.specifier : type.qualifier.list_basic.type.name
sue.type.specifier : type.qualifier.list_elaborated.type.name
typedef.type.specifier : type.qualifier.list_TYPEDEFname
operator.function.ptr.opt : _ (29)
DOUBLE shift 42
INT shift 39
STRUCT shift 68
LONG shift 40
ENUM shift 66
CHAR shift 37
UNION shift 69
CONST shift 63
FLOAT shift 41
SHORT shift 38
UNSIGNED shift 44
SIGNED shift 43
VOID shift 36
VOLATILE shift 64
CLASS shift 70
TYPEDEFname shift 433
* shift 50
& shift 51
. reduce 29
operator.function.ptr.opt goto 432
pointer.operator goto 429
indirect.or.reference goto 430
basic.type.name goto 151
type.qualifier goto 150
elaborated.type.name goto 152
aggregate.name goto 48
enum.name goto 49
aggregate.key goto 65
429: shift/reduce conflict (shift 50, red'n 29) on *
429: shift/reduce conflict (shift 51, red'n 29) on &
state 429
operator.function.ptr.opt : pointer.operator_operator.function.ptr.opt
operator.function.ptr.opt : _ (29)
TYPEDEFname shift 431
* shift 50
& shift 51
. reduce 29
operator.function.ptr.opt goto 636
pointer.operator goto 429
indirect.or.reference goto 430
430: shift/reduce conflict (shift 50, red'n 29) on *
430: shift/reduce conflict (shift 51, red'n 29) on &
state 430
operator.function.ptr.opt : indirect.or.reference_operator.function.ptr.opt
pointer.operator : indirect.or.reference_type.qualifier.list
operator.function.ptr.opt : _ (29)
CONST shift 63
VOLATILE shift 64
TYPEDEFname shift 431
* shift 50
& shift 51
. reduce 29
operator.function.ptr.opt goto 637
type.qualifier.list goto 166
pointer.operator goto 429
indirect.or.reference goto 430
type.qualifier goto 46
484: shift/reduce conflict (shift 50, red'n 103) on *
484: shift/reduce conflict (shift 51, red'n 103) on &
state 484
operator.new.type : type.qualifier.list_operator.new.declarator.opt operator.new.initializer.opt
type.qualifier.list : type.qualifier.list_type.qualifier
basic.type.specifier : type.qualifier.list_basic.type.name
sue.type.specifier : type.qualifier.list_elaborated.type.name
typedef.type.specifier : type.qualifier.list_TYPEDEFname
operator.new.declarator.opt : _ (103)
DOUBLE shift 42
INT shift 39
STRUCT shift 68
LONG shift 40
ENUM shift 66
CHAR shift 37
UNION shift 69
CONST shift 63
FLOAT shift 41
SHORT shift 38
UNSIGNED shift 44
SIGNED shift 43
VOID shift 36
VOLATILE shift 64
CLASS shift 70
TYPEDEFname shift 433
* shift 50
& shift 51
[ shift 678
. reduce 103
pointer.operator goto 677
indirect.or.reference goto 676
basic.type.name goto 151
operator.new.declarator.opt goto 674
operator.new.array.declarator goto 675
type.qualifier goto 150
elaborated.type.name goto 152
aggregate.name goto 48
enum.name goto 49
aggregate.key goto 65
485: shift/reduce conflict (shift 50, red'n 103) on *
485: shift/reduce conflict (shift 51, red'n 103) on &
state 485
operator.new.type : type.specifier.or.name_operator.new.declarator.opt operator.new.initializer.opt
operator.new.declarator.opt : _ (103)
TYPEDEFname shift 431
* shift 50
& shift 51
[ shift 678
. reduce 103
pointer.operator goto 677
indirect.or.reference goto 676
operator.new.declarator.opt goto 679
operator.new.array.declarator goto 675
642: reduce/reduce conflict (red'ns 17 and 22 ) on (
642: reduce/reduce conflict (red'ns 17 and 22 ) on )
642: reduce/reduce conflict (red'ns 17 and 22 ) on [
state 642
paren.identifier.declarator : rescoped.identifier_ (17)
primary.expression : rescoped.identifier_ (22)
( reduce 17
) reduce 17
[ reduce 17
. reduce 22
676: shift/reduce conflict (shift 50, red'n 103) on *
676: shift/reduce conflict (shift 51, red'n 103) on &
state 676
operator.new.declarator.opt : indirect.or.reference_operator.new.declarator.opt
pointer.operator : indirect.or.reference_type.qualifier.list
operator.new.declarator.opt : _ (103)
CONST shift 63
VOLATILE shift 64
TYPEDEFname shift 431
* shift 50
& shift 51
[ shift 678
. reduce 103
type.qualifier.list goto 166
pointer.operator goto 677
indirect.or.reference goto 676
operator.new.declarator.opt goto 802
operator.new.array.declarator goto 675
type.qualifier goto 46
677: shift/reduce conflict (shift 50, red'n 103) on *
677: shift/reduce conflict (shift 51, red'n 103) on &
state 677
operator.new.declarator.opt : pointer.operator_operator.new.declarator.opt
operator.new.declarator.opt : _ (103)
TYPEDEFname shift 431
* shift 50
& shift 51
[ shift 678
. reduce 103
pointer.operator goto 677
indirect.or.reference goto 676
operator.new.declarator.opt goto 803
operator.new.array.declarator goto 675
740: reduce/reduce conflict (red'ns 74 and 64 ) on )
740: reduce/reduce conflict (red'ns 74 and 64 ) on ,
740: reduce/reduce conflict (red'ns 74 and 64 ) on =
state 740
postfix.expression : TYPEDEFname ( )_ (74)
parameter.type.list : ( )_type.qualifier.list.opt
type.qualifier.list.opt : _ (64)
) reduce 64
, reduce 64
CONST shift 63
VOLATILE shift 64
ELLIPSIS reduce 64
= reduce 64
. reduce 74
type.qualifier.list goto 533
type.qualifier.list.opt goto 532
type.qualifier goto 46
782: shift/reduce conflict (shift 595, red'n 418) on )
state 782
class.rescoped.identifier : TYPEDEFname_CLCL identifier.or.typedef.name
class.rescoped.identifier : TYPEDEFname_CLCL operator.function.name
class.rescoped.identifier : TYPEDEFname_CLCL ~ TYPEDEFname
class.rescoped.identifier : TYPEDEFname_CLCL class.rescoped.identifier
postfix.expression : TYPEDEFname_( )
postfix.expression : TYPEDEFname_( argument.expression.list )
typedef.type.specifier : TYPEDEFname_type.qualifier
type.name : TYPEDEFname_ (418)
type.name : TYPEDEFname_abstract.declarator
paren.postfix.typedef.declarator : ( TYPEDEFname_postfixing.abstract.declarator )
simple.paren.typedef.declarator : ( TYPEDEFname_)
pointer.operator : TYPEDEFname_CLCL * type.qualifier.list.opt
( shift 687
) shift 595
CONST shift 63
VOLATILE shift 64
TYPEDEFname shift 431
CLCL shift 129
* shift 50
& shift 51
[ shift 98
. error
pointer.operator goto 684
indirect.or.reference goto 683
postfixing.abstract.declarator goto 874
type.qualifier goto 133
parameter.type.list goto 97
abstract.declarator goto 561
unary.abstract.declarator goto 548
postfix.abstract.declarator goto 549
array.abstract.declarator goto 96
874: shift/reduce conflict (shift 838, red'n 591) on )
state 874
paren.postfix.typedef.declarator : ( TYPEDEFname postfixing.abstract.declarator_)
abstract.declarator : postfixing.abstract.declarator_ (591)
) shift 838
. error
875: shift/reduce conflict (shift 840, red'n 418) on )
state 875
class.rescoped.identifier : TYPEDEFname_CLCL identifier.or.typedef.name
class.rescoped.identifier : TYPEDEFname_CLCL operator.function.name
class.rescoped.identifier : TYPEDEFname_CLCL ~ TYPEDEFname
class.rescoped.identifier : TYPEDEFname_CLCL class.rescoped.identifier
postfix.expression : TYPEDEFname_( )
postfix.expression : TYPEDEFname_( argument.expression.list )
typedef.type.specifier : TYPEDEFname_type.qualifier
type.name : TYPEDEFname_ (418)
type.name : TYPEDEFname_abstract.declarator
paren.typedef.declarator : indirect.or.reference ( TYPEDEFname_)
paren.postfix.typedef.declarator : ( TYPEDEFname_postfixing.abstract.declarator )
pointer.operator : TYPEDEFname_CLCL * type.qualifier.list.opt
( shift 687
) shift 840
CONST shift 63
VOLATILE shift 64
TYPEDEFname shift 431
CLCL shift 129
* shift 50
& shift 51
[ shift 98
. error
pointer.operator goto 684
indirect.or.reference goto 683
postfixing.abstract.declarator goto 874
type.qualifier goto 133
parameter.type.list goto 97
abstract.declarator goto 561
unary.abstract.declarator goto 548
postfix.abstract.declarator goto 549
array.abstract.declarator goto 96
876: shift/reduce conflict (shift 950, red'n 451) on ELSE
state 876
selection.statement : IF ( expression ) statement_ (451)
selection.statement : IF ( expression ) statement_ELSE statement
ELSE shift 950
. reduce 451
1026: shift/reduce conflict (shift 1066, red'n 571) on ;
state 1026
constructor.conflicting.parameter.list.and.body : ( TYPEDEFname )_;
constructor.conflicting.parameter.list.and.body : ( TYPEDEFname )_type.qualifier.list ;
constructor.conflicting.parameter.list.and.body : ( TYPEDEFname )_constructor.init.list.opt compound.statement
constructor.conflicting.parameter.list.and.body : ( TYPEDEFname )_type.qualifier.list constructor.init.list.opt compound.statement
simple.paren.typedef.declarator : ( TYPEDEFname )_ (571)
constructor.init.list.opt : _ (539)
CONST shift 63
VOLATILE shift 64
: shift 81
; shift 1066
{ reduce 539
. reduce 571
type.qualifier.list goto 1067
type.qualifier goto 46
constructor.init.list goto 1069
constructor.init.list.opt goto 1068
1065: shift/reduce conflict (shift 1102, red'n 359) on ;
state 1065
member.conflict.paren.postfix.declaring.item : TYPEDEFname ( TYPEDEFname postfixing.abstract.declarator )_member.pure.opt
constructor.conflicting.typedef.declarator : ( TYPEDEFname postfixing.abstract.declarator )_type.qualifier.list ;
constructor.conflicting.typedef.declarator : ( TYPEDEFname postfixing.abstract.declarator )_type.qualifier.list constructor.init.list.opt compound.statement
constructor.conflicting.typedef.declarator : ( TYPEDEFname postfixing.abstract.declarator )_;
constructor.conflicting.typedef.declarator : ( TYPEDEFname postfixing.abstract.declarator )_constructor.init.list.opt compound.statement
member.pure.opt : _ (359)
constructor.init.list.opt : _ (539)
CONST shift 63
VOLATILE shift 64
: shift 81
= shift 971
; shift 1102
{ reduce 539
. reduce 359
type.qualifier.list goto 1101
type.qualifier goto 46
member.pure.opt goto 1100
constructor.init.list goto 1069
constructor.init.list.opt goto 1103
1089: shift/reduce conflict (shift 1102, red'n 359) on ;
state 1089
member.conflict.paren.postfix.declaring.item : declaration.specifier ( TYPEDEFname postfixing.abstract.declarator )_member.pure.opt
constructor.conflicting.typedef.declarator : ( TYPEDEFname postfixing.abstract.declarator )_type.qualifier.list ;
constructor.conflicting.typedef.declarator : ( TYPEDEFname postfixing.abstract.declarator )_type.qualifier.list constructor.init.list.opt compound.statement
constructor.conflicting.typedef.declarator : ( TYPEDEFname postfixing.abstract.declarator )_;
constructor.conflicting.typedef.declarator : ( TYPEDEFname postfixing.abstract.declarator )_constructor.init.list.opt compound.statement
member.pure.opt : _ (359)
constructor.init.list.opt : _ (539)
CONST shift 63
VOLATILE shift 64
: shift 81
= shift 971
; shift 1102
{ reduce 539
. reduce 359
type.qualifier.list goto 1101
type.qualifier goto 46
member.pure.opt goto 1128
constructor.init.list goto 1069
constructor.init.list.opt goto 1103
123/127 terminals, 160/200 nonterminals
609/650 grammar rules, 1157/1200 states
25 shift/reduce, 7 reduce/reduce conflicts reported
243/350 working sets used
memory: states,etc. 13727/60000, parser 10871/12000
495/600 distinct lookahead sets
756 extra closures
7300 shift entries, 25 exceptions
1797 goto entries
4731 entries saved by goto default
Optimizer space used: input 18036/60000, output 8092/12000
8092 table entries, 3568 zero
maximum spread: 351, maximum offset: 1148